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■ 

IMPROVED METHODS AND APPARATUS FOR FAST FOURIER TRANSFORM 

The invention provides improved methods and apparatus for fast fourier transform. 

From the user's perspective, the code performs an in-place "split-complex" ID FFT (forward or 
inverse) for power of 2 sizes ranging from 16 to 4096, inclusive. 

There are 3 user-callable functions: fft_setupO» fft_zO and fft JreeO: 

void fft_setup ( unsigned long LOG2N, FFTjetup *SETUP ); 
void fit z ( float *Creal, float *Cimag, unsigned long LOG2N, FFTjetup "SETUP ); 
void fff free ( FFT_setup * SETUP ); 

' ■ ' • • i ' ' 

FFT_setup is a structure defined as follows: 

typedef struct { 

float "twidp; /* pointer to 16-byte aligned 

malloc'ed twiddle buffer *.' • 

unsigned char *bitrp; /* pointer to static bit-reversal 

table*/ 
}FFT_setup; 

A user first calls fft_setupO specifying a particular FFT size (actually, the base 2 log of the size) 
along with a pointer to an untoinalized'FFT_setup structure.; This function allocates (mailoc) and 
builds the appropriate "twiddle" table and places a pointer to this table and the appropriate bit- 
reversal table (a static table) in the FFT_setup structure supplied by the caller. 

Next, fft_zO can be called repeatedly for the same size FFT as was specified in the 
corresponding call to £ft_setupO- The user must also specify the same FFT_setup structure that 
was filled in by that call. The input/output vectors are supplied in a split-complex format with 
the real parts contiguous in the first float vector argument (Creal) and the corresponding 
imaginary parts contiguous in the second float vector argument (Cimag). The call performs a 
forward FFT. To perform an inverse FFT, simply interchange the real and imaginary vectors 
(i.e., specify }he imaginary vector in the. first argument and the real vector in the second 
argument). ' 

Finally, the user calls St_freeO to free the twiddle buffer previously allocated and constructed by 
fft.serupO. The user must specify the same FFT_setup structure to both calls. 

Here is a one line description of what is in each file: 

fft.h: user's header file 

fftbitr: contains static bit-reversal tables for all 9 FFT sizes ( 1 6 - 4096) 
fft_setup.c source for fft_setupO and fft_free() 
fit z.c source for fft_zO 

ppc_vmx.h: macro header file for VMX (altivec) emulation of S1MD instrucuons. 
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ppc_vmx.c: contains C functions that emulate VMX (altivec) SIMD instructions 

Note that ffizO is implemented using macros that emulate ^ s ^J2^2^ a 
structure CVMX reg) defined in ppc_vmxJi that emulates a 16-byte . VMX SIMD register. The 

floating point variables used in f&zO ° f ^ ^ does n ? ? at ™ ™ 
PPC G4 implementation of fft.zO insofar as the instructions are *not* ordered m an optimal way 
for that processor. However, the primary patent claim is clearly demonstrated m the final pass of 
the FFT which begins on line 661 of fft_z.c. This section performs the final radbt-4 in-place pass 
of the FFT but manages to leave the results correctly ordered in the real and imaginary 
input/output vectors. This can be accomplished with 32 or fewer 16-byte "registers ' (i.e., 512 or 
fewer bytes of temporary storage). 

It will be appreciated that the teachings hereof may be applied using different programming 
languages, toolsets, operating systems, platforms and otherwise. 




— PartA 7 
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/#.+**♦***.*********#**♦****•*.*-*♦* * 

I* File Name: fft.h 

|* Description: Header file for ETT functions 

I * 

|* Mercury Computer Systems, Inc« 

I* Copyright (c) 1999 All rights reserved 
j + 

I * Revision Date Engineer; Reason 
I * — — — ----- • — - 

I* 0.0 991119 jg; Created 

/* 

* frr setup structure 

* contains pointers to twiddles and bit-reversed indices 

* pointers are filled in by fft_setup() function 
*/ 

typedef struct { 

float *twidp; 

unsigned char *bitrp; 
} FFT_setup; 

/* 

* FFT function prototypes 
*/ 

void f f t free ( FFT setup *SETUP ) ; 

void fftTsetup< unsigned long LOG2N, ffT_setup *$ETUP >; 
void tttTzl float *Cr, float *Ci, unsigned lonq LOG2N, FFT_setup *SETUP ); 



/ 



■I 
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* File Name; 

* Description: 



Let: 



LQG2M 
M - 2 



fft_bitr.c 

Special bit-reversed tables for FFT sizes 
A <= LOG2N <« 12 

= LOG2N - 4 
* LOG2M 



For each table: 
section I: 

nl «■ bitrtO] = # of elements in section 1 

(The first and second elements are not in the table 

as they are known to be 0 and M-l, respectively*) 

0, H-l, bitrCl], bitr[nl-2) - 

indices that bit-reverse to themselves 

section 2; 

n2 = bitr[nl-l] - * of elements in section 2 
It's always true that nl + n2 = M- 
(The first element is not in the table and, if 
n2 !~ 0/ is known to be 1.) 

(1, bitrfnl]), (bitr(nl+l], bitr [nl+2] ) , 
(bitr[M-3], bitr[M-2]) - n2/2 pairs of indices that 
bit-reverse to each other. bitr[M-l) * 0- 

Mercury Computer Systems, Inc. 
Copyright (c> 1996 All rights reserved 



Revision Date 
Q.O 990716 



Engineer; Reason 
jg; Created 



Table for' M = 1 l«»16). 



unsigned char _f f t_bitr_l [] » { 



1, 

0, 0, 0 



}; 



* Table for M - 2 <N « 32) . 
*/ 

unsigned char f£t_bitr_2[] = { 
2, 

0, 0, 0 

>; 
/* 

* Table for to - A (N = 64). 
*/ 

unsigned char _f ft_bitr_4 ( ) - { 
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2, - 

2, 2. 0 



* Table for M = 8 (N - 128) . 
•/ 

unsigned char _f ftJoitr_8 [] - { 
4, 2, 5, 
4, 4, 3, 6, 0 

}; 
/* 

* Table for M = 16 (N = 256) . 
'*/ 

unsigned char _t ftjsitr_16 [J - I 

l^W 4, 3, 12, 5, 10, 7, 14, 11, 13, 0 

); 
/* 

* Table for M => 32 (N = 512). 
*/ 

unsigned char _fft bitr_32[) - { 

8, 4, 10, 14, 17, 21, 27, 

24, 16, 2, 8, 3, 24, 5, 20, 6, 12, 7, 28, 

9, 18, 11, 26, 13, 22, 15. 3D, 19, 25, 23, 29, 0 

); 



* Table for M - 64 (N = 1024) . 
*/ 

unsigned char _fft bitr_64 [] - i 
8, 12, 18, 30, 33, 45, 51, 
56, 32, 2, 16, 3, 48, 4, 8, S, 40, 6, 24, 
7, 56, 9, 36, 10, 20, 11, S2, 13, 44, 14, 28, 

15, 60, 17, 34, 19, 50, 21, 42, 22, 26, 23, 58, 
25, 38, 27, 54, 29, 46, 31, 62, 35, 49, 37, 41, 
39, 57, 43, 53, 47, 61, 55, 59, 0 

); 

t* / 

* Table fsr M - 126 (N = 2048). 

*/ 

unsigned char _fft_bitr_128 [] - ( 

16. 8, 20, 28, 34, 42, 54, 62, 65, 73, 85, 93, 99, 107, 119, 
112, 64. 2, 32, 3, 96, 4, 16, 5, 80. 6, 48, 7, 112, 9, 72, 

10. 40. 11, 104. 12, 24, 13, 88, 14, 56, 15, 120, 17, 68, 18, 36, 
19 100, 21, B4, 22, 52. 23, 116, 25, 76, 26, 44, 27, ,108, 29, 92, 



); 




123, 0 



* Table for M - 256 <N = 4096) 
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*/ 

unsigned char fftJjitr_2S6U ■ ( - 91 « 2 31. 

16, 24, 36, 60, 66, 90, 102, 126, 129. 153, 165, 189, 195, 219, 231, 

,oo 9 sa 3 192. 4. 32, 5, 160, 6, 96, 7, 224, 8, 16, 
9 i« io 'so 'l ! 208 12, 48, i 3 , 176. 14, 112. 15, 240, 17, 136, 
?fl 12 19 200 20, 40 21 163. 22. 104, 23, 232. 25, 152, 26, 88, 
IV IV* la 56 29 184. 30, 120, 31, 248, 33, 132, 34, 68, 35, 196, 
W l\V IV Ho 39, 228, 41, 148, 42, 84, 43, 212, 44, 52, 45, 180 
IV 116 4?' ITa 49 140 50 76. 51, 204, 53, 172, 54. 108, SS, 236, 

?5' 2 "' 8?' Si "3," 2, 7 5, 1 O. 7 6, 8 06 5 87 ^34 IV'^V™. 
93' ?B6' 15' IS! 95 250, S? 134 99, 198, 101. 166, 103, 230, 105, 150, 
111. "i. 109 182, 110, 118, 111, 246, 113, 142. 115, 206, 117, 174, 119. 

ft8 iil, 158, 123/222, 125. 190, 127, 254, 131, 193, 133, 161, 135, 225, 137/ 

145 i 3 9, 209, 141, 177. 143, 241, 147, 201, 149, 169, 151, 233, 155. 217, 157, 

l85 'l59, 249, 163, 197, 167, 229, 171, 213, 173, 181. 175, 245, 179, 205, 183, 

" 7 i87, 221, 191, 253, 199, 227, 203, 211, 207, 243, 215, 235, 223, 251, 239, 
247, 0 

); 



/ 
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♦ ********* *♦*♦.*****.***« 

2£5Em .spilt complex in-place FED 

Entry/params: void fft_setu P C jJ^j^MMJ^ } 

Entry/params: void fft_free ( FFT_setup *SETUP ) 
Formula: 

U062M is the log (base 2) of the FFT size. 
(4 <= LOG2N <= 12) 



Let: N = 2 A LOG2N 

LOG2M/ - LOG2N - 4 
M =■ i " LOG2M . 

A - 2 * PI / N . 
SITB.( i, m ) = bit-reversal of unsigned integer i 
over m bits 

void fft_setup ( ulong LQG2N, FFT^setup *SETUP ) 

SETOP->twidp is set to an allocated buffer that is 

16-byte aligned and contains M sets of 4 x 4 floating 
point twiddles arranged exactly as follows: 

cos(kA), cos(lk + l>A), cos((k+2)A), C0»J|JJ+j;jJ» 
sin(kA). sin((k+l)A), sin((k+2)A), "n((kt3)A), 
cos(2kA), cos(2(k+l)A), cos(2(k+2)A), cos 2 k+3 A , 
sinl2kA). s'int2<k+l)A), sin (2 (k+2) A) , sin(2<k+3)A> 

for k = 0 

cos(kA), cos(«k+l)A) r cos((k+2)A), cos((k+3JA), 
tan(kA), tsn[(k+l>A), tan((k+2)A), tan((k+3)A>, 
cot(2kA), cot(2(k+l)A), cot(2(k+2>A), cot 2 k+3 A . 
sin(2kA), sin(2(k+l)A), sin(2 (k+2)A) , sin(2(k+3)A) 

for k - 4 * BITR ( 1/ LOG2M ), 
4 * BITR( 2, tOG2M ), 



cos <kA) , 
sin(kA) , 
cos(2kA) 
sin(2kA) 



4 * BITR( M-2, LOG2M ) 

coa((k+l>A), cos ( (k+2) A) , cos ( (k+3) A) , 
sin((k*l)A), 3in((k+2)A), sin((k+3)A), 
, cos(2(k+l)A)» cos(2(k+2)A), cos (2 (k+3) A) , 
, sin{2(k+l)A), sin(2(k+2)A), sin(2{k+3)A) 



for k = 4 * (M - 1) 

SETUP->bitrp is set to static table of M unsigned char 
bit-reversed index values (LOG2M bits) arranged 
as follows; 

section 1: 

nl = bitrp[0) - # of elements in section l 

(The first and second elements are not in the table 



* I 
*l 
*l 
•I 
♦I 
♦I 
*l 
•I 
*l 
*l 
*l 
*l 
*l 
*l 
*l 
*1 

* I 
*l 
*l 
M 
*l 
*l 
•I 
*! 
*l 

* I 
*l 
*l 
*l 
*l 
*l 
*l 
M 
*1 
*l 
*l 
*l 
-I 
"I 
*1 
*l 
•I 
♦I 
*l 
* I 
*l 



* I 
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> as they are known to be 0 and M-l, respectively.) 

t 

> 0, M-l, bitrptH/ - ... bitrp[nl-21 = 

> indices that bit-reverse to themselves 
k 

section 2; _ 
n2 * bitrplnl-1] - # of elements in section z 

* It's always true that nl * n2 = M. 

* (The first element is not in the table and, if 

* n2 1= 0, is known to be 1-) 

(l r bitrptnl]), (bitrp{nl+l), bitrp[nl+2] > , 
(bitrptM-31, bitrp(M-2)) - n2/2 pairs of indices that 

* bit-reverse to each other. bitrp[M-lJ - 0* 

* * 

* void fft_free'( FFT_setup *SETOP >' 
* 

* frees SETUP->twidp and sets SETUP->twidp and 

* S£TUP->bitrp to 0 



Revision 



0.0 



Mercury computer Systems, Inc. 
Copyright <c> 1999 All rights reserved 

Date Engineer; Reason 

991119 jg; Created 



....j. ^ > ^ + + * + ***************** 
* * + + + * ***^ ****** ***********"***** 



' I 
r l 
r I 
"I 
1 1 
•7 



include <malloc.h> , 
^include <math.h> 
ft include "fft.h" 
^include "ppc_vmx.h" 

^define TWOPI (double) 6. 2831853071795864769252868 
^define BITR( log2x, index, bitr^index ) \ 
( \ 

ulong _bitr_i, _bitr_x; \ 
_bitr x = (index) ; \ 
bitr index = 0; \ 

for T bitr_i ~ 0; JbitrJ. < (log2x) ; _bitr_i++ ) I \ 
bitr index «= 1; \ 
bitr^index l« (J>itr_x 6 1); \ 
Jo£tr_x 1; \ 

) \ 

}. 

extern uchar _f f t_bitr_l [] ; 
extern uchar _f f t~bitr_2 I ] i 
extern uchar JE f t_bitr_4 [3 ; 
extern uchar _i ft_bitr_8 ( } ; 
extern uchar _f f t_bitr_16 [] ; 
extern uchar _f f t_bitr_32 [] ; 
extern uchar _f f t_bitr_64 I ) ; 
extern uchar jC f t_bitr_128 [ ] ; 
extern uchar ~f f t_bitr_256 [] ; 

void fft_setup( ulong LOG2N, FFT_setup 'SETUP ) 
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char **mallocp; 
char 'buffer; 
float *twidp; 
ulong bitr — i, i# j/ 
double angle, cosl, 



log2n_m4, n, nv!6; 
cos2 r delta, incr, 



sinl, sin2, twopivn; 



n « 1 « LOG2N; 

buffer = malloc( (n * sizeof (float) > + 20 J ; 
if ( '.buffer ) ( 

SETUp->twidp - (float +)0? 

return; 

} / . . ■ • ■ ' ' • 

twidp - (float *> ((ulong) (buffer + 20) & -15); 
msllocp = (char **) (twidp - 1) ; 
♦mailoep « buffer; 

nvl6 ~ n » 4; 
log2n m4 = LOG2N - 4; 
twopivn « TWOPI / (double) n; 
delta ~ (double) 0.0; 

for ( 1 - 0; i < nvl6; i++ ) 1 
for ( j » 0; j < 4; j++ ) I 
incr - delta; 
angle = twopivn * incr;. . 
cosl = cos (angle) ; 
sinl - sin (angle) ; 
incr +- delta; 
angle « twopivn * incr; 
cos2 = cos (angle) ; 
sin2 » sin (angle) ; 

if ( ( i _ o ) U ( i — < nv16 -!>)>' 
twidp t(i « 4) + j] - (float) cosl; 
tWidpUi « 4) + j + 4] - (float sinl; 
twidplCi « 4) ♦ j + 8] - (float)cos2; 
twidptd « 4) + j + 12] = (float)sxn2; 

> /' 
elfle ( 

BITR( log2n_jn4, i, bitr_i ) 

twido[(bitr i « 4) + j] - (float) coal; 

twidp bitr-i ««)♦!♦«- (float) (sinl / cosl 

twidp bitr"i « 4) * j + 8] = (float) (co32 / sxn2) / 

twidp! (bitr _ i « 4) + j + 12] - <float>sin2; 

) " • 

delta += (double) 1.0; 



SBTOP->twidp - twidp; 
if { LOG2N == 4 ) 

SETUP->bitrp = _fft_bitr_l; 
else if ( LOG2N =- 5 ) 



II 
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SETUP->bitrp «= _f f t_bitr^2; 
else if ( LOG2H «=~6 ) 

SETUP->bitrp - jf * t_bitr_4 i 
else if { LOG2N -= 1 ) 

$ETUP->bitrp = _fft_bitr_8; 
else if ( LOG2N =-~8 ) 

SETUP->bitrp « _f f t_bitr._16; 
else if { LOG2N — 9 ) 

SETDP->bitrp = _fftjaitr_32; 
else if ( LOG2N — 10 ) 

$ETUP->bitrp = _f ft_bitr_64; 
else if .( LOG2N — U > 

SETOP->bitrp = _f ftj3itr_!28; 
else if ( LOG2N == 12 ) 

SETUP->bit^p - _fft_bitr_25G; 

return; 



void fft_free( fFT^setup "SETUP ) 



char **mallocp; 

if { ($ETUP->bitrp ™ _fft_bitr_l) I I 
{ SETUP- >bitrp = _ff t_bitr_2) I I 
(SETUP->bitrp — _fft~bitr_4) II 
(SETUP->bitrp fft_bitr_8> I I 

<SETUP->bitrp — _f£t_bitr_16> II 
(SETUP->bitrp == _fftjbitr_32) II. 
( SETUP- >bitrp == Jt t tj>itr_64 ) (I 
(SETUP->bitrp — JC ftjbitrj.28) II 
( SETUP- >bitrp « ~f ft_bitr_256) ) { 

mallocp - (char **HSETUP->twidp - 1) ; 

free ( *mallccp ) ; 

SETOP->twidp ~ (float *)0; 
SETUP- >bitrp = (uchar *)0; 
return; 
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\: EiiLgSia: 5S£S l« l; T .r,., c»pUx la-pUc* ID Prr *-j 
. E nt„/P.r...: void £ «_. < tU£ ^~*JZ. . SET0P , 



I * Formula : * I 

r Cr/Ci - 2»LOG2N-poinC (4 <= LOG2N <= 12) ««™«d in-pUo. * I 
,♦ complex Id FFT of the split complex vector stored * 

I * in Cr and Ci. « j 

!* (Note, an inverse FFT can be performed by swapping *j 

I 



I * Cr and Ci . ) 

I* * 
| * where : * j 

|* cr and Ci must be 16-byte aligned and have unit stride *l 
|* stride between adjacent real (Cr) and imaginary (Cr) ^1 
I* points. #1 

|* LOG2N is the log (base 2) of the FFT size. *j 
|* (4 <= LOG2N <* 12) „j 

_ . „ . ™ * I 

*l 
*l 
> I 



|* Let: N - 2 A LOG2N 

I* LOG2M = LOG2N - 4 

|* M = 2 ~ LOG2M 

]- bITrI i," > « bit-reversal of unsigned integer i *j 

■ * over m bits 1 



1 

!• SETuP->twidp is a 16-byte aligned pointer to H sets *l 
j. of 4 x 4 floating point twiddles arranged exactly * 

I * as follows : + 1 



cos(kA), cos((k+l)A), cost (k+2) A , cos A , 

sin(kA), sin((k+l)A), sin((k+2)A>, sxn ( tk+3)M , 

* cos 2kA , cos(2(k+l)A), cos (2 (k+2) A) , cos (2 k+3 A , * 
',. sin(2kA): sin(2(k+l)A), sin(2 (k+2) A) , sin(2(k+3)A) • 

I* *| 
|+ fo/ k - 0 

!* cosUcM, co S ((k+l)A), cos((k+2)A>, ~s((It + 3)A), *l 

tan(kA), tan((k+l)A), "n((k + 2)A), tan((k + 3)A). 

. • cot(2kA), cot(2(k+l)A) / cot(2(k+2)A , cot k A . * 

I* sint2kA>, sin(2(k+l)A), sin(2(k+2)A> . sin(2(k+3)A) * j 

I* for k = 4 * BITR ( 1, WG2M ), ' 

,* 4 * BITR( 2, LOG2M ), # j 

[I 4 *"BITR< M-2. LOG2M ) "j 

!* cos(kA), cos((k+l)A>, cos((k+2)A), <=° s < < A > ' ** 

* sin(kA), sin<(k+l)A), sin((k+2)A), «in( (k-h3)A) , 

COS 2kA), cos(2(k+l)A), cos(2(k+2)A>, cos 2 k + 3 A . * 

* sin(2kA>, sin(2(k+l)A), sin (2 (k+2) A) , sin (2 (k+3) A) *l 
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* 

* 
+ 
* 

* 

+ 



for k - 4 * <M - 1) * 

SETOP->bitrp is a pointer to M unsigned char * 
pit-reversed index values (LOG2M bits) arranged 

as follows: * 

section 1 ; . . * 
nl = bitrpIO] - # of elements in section 1 
(The first and second elements are not in the table 
as they are known to be 0 and M-l, respectively.) 

0, M-l, bitrptl], . bitrptnl-2] = * 

indices that bit-reverse to themselves ^ 

- • * 

section 2: fc * + 

n2 - bitrp[nl-l] = # of elements in section 2 ^ 

It's always true that nl + n2 - M. ^ 
(The first element is not in the table and, it 

n2 !=• 0, is known to be 1.) ^ 

(1, bitrptnl]), (bitrp[nl+H, bitrp[nl+2) ] J, . - * 
lbJLtrptM-3], binrpCM-2]) = n2/2 pairs of indices that * 

bit-reverse to each other. bitrpTM-1] - 0. ^ 

Mercury Computer Systems, Inc. 

Copyright (c) 1999 All rights reserved * 



Revision 



Date 
991119 



Engineer; Reason 
Created 



■ ^ n n 991119 jg; Created \ 



^include "fft.h" 
finclude "ppc_vmx.h M 



/* 
+/ 

void tttjL i float *Cr, float *Ci, ulong LOG2N, FFT_aetup 'SETUP ) 
/ 

( float *Crl, *Cil, *Cr2, *Ci2, *Cr3, *Ci3; 

float *Cr4, *Ci4, *Cr5, *CiS, *Cr6, *Ci6, *Cr7, *Ci7, 
float *wpO, *wpl, *wp2, *wp3; 
unsigned char *bitrp; 

ulong index, index_bump, indexl, index2, windex; 
ulong bflycnt, bflyoff, gent, sent, N; 

VMX reg aOr, aOi, air, ali, a2r, a2i, a3r, a3i; 

VMx'reg yOr, yOi, ylr, yli, y2r, y2i, y3r, y3i; 

VMK reg tlr, tli, t2r, t2i, m2r, m2i, m3r, m3i; 

VMX_reg pOr, pOi, plr, pli, p2x f p2i, p3r, p3i; 

VWx"reg xlr, xli, x2r, x2i; 

VMX"reg cosl, sinl, ccs2, sin2, tanl, cotz; 
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VMX 

VMX~ 

VMX 

VMX 

VMX 

VMX 

VMX* 

VMX" 

VMX" 



«9 
reg 
reg 
reg 
reg 
jreg 
[reg 
reg 
>eg 



a0r_8 , 
a4r_8, 
yOr~8/ 

tir_8, 

t5r_8, 
dlr 8, 

em4r 8 



aOi_8, 
a4i_8, 
y0i_8, 
y4i_8, 
tli 8, 
t5I 8, 
dli~8, 
slij, 
. em4i 



alr_8, 
a5r_8, 
ylr~8, 
y5r_8 , 
t2r_8, 
t6r 8, 
d2r_8, 
s2r_8, 
8, em7r 



ali_8, a2r_ 
a5i 8, a6r 
ylO, y2r 
y5i_8, y6r 
t2i_8, t3r 
t6i 8, t7r 
d2i_8, m2r 
52O, s3r 
B, em7i_8, 



8. 

'e, 

"8, 
"8, 
B, 
"8, 
"8, 
"8, 



a2i 8, 
a6i_8, 
y2i_8, 
y6i 8, 
t3i~8, 

m2i_8, 
s3i_8, 



a3r_B, 
a7r_8, 
y3x"8, 
y7r_8, 
t4r_8, 
t8x_8, 
m5r_8, 
s4r 8, 



a3i_8; 
a7i 8; 
y3i~8; 
y7i 8; 
t4i 8; 
t80; 
m5i_8; 
s4i 8; 



rad2v2; 



r 



* here if N >» 16 



wpO = S£TUP->twidp? 
wpl = wpO 4 ,7 
wp2 - wpO * 8; 
wp3 =» wpO + 12; 
bi-trp = SETUP- >bitrp; 
N = 1 « LOG2N; 

if ( LOG2N & 1 ) { 

/* radix-8 first pass */ 

windex = 64; 

LVE«X( rad2v2, wpO, windex ) 
bflyoff w N » 1; 
VSPLTW( rad2v2, rad2v2 



Crl 
Cil 
Cr2 
Ci2 
Cr3 
Ci3 
Cr4 
Ci4 
Cr5 
Ci5 
Cr6 
Ci6 
Cr7 
Ci7 



. (float 

• (float 
» (float 
» (float 
. (float 
= (float 
= (float 
= (float 

* (float 
= (float 
- (float 
= (float 
= / float 
-./(float 



*) ( (char 
*) ( (char 
*) { (char 
*) ( (char 
*) ( (char 
*) ((char 
*) ( (char 
*) ( (char 
*) ( (char 
*) ( (char 
*) ( (char 
*) ( (char 
*) ( (char 
*) ( (char 



0 ) 

*)Cr + 

*)Ci + 

*)Crl 

*)Cil 

•)Cr2 

*)Ci2 

*)Cr3 

*)Ci3 

*)Cr4 

*)Ci4 

*)Cr5 

*)Ci5 

*)Cr6 

*)Ci6 



/* cos (PI/4) » sqrt(2)/2 */ 
/* 4 * N /e - n/2 byte offset */ 
/* replicate 4 times */ 



bflyoff) ; 
bflyoff) ; 
+ bflyoff ) ; 
bf lyof f ) ; 
bflyoff) } 
bflyoff) ; 
bflyoff) ; 
bflyoff) ; 
bflyoff); 
bflyoff) ; 
bflyoff); 
bflyoff); 
bflyoff) ; 
bflyoff) ; 



index = 0; 



bflycnt 
while { 
LVX( 
LVX( 
L,VX( 
LVX( 
LVX( 
LVX( 
LVXl 
LVXl 
LVX( 



» bflyoff; 
bflycnt ) ( 
a0r_8, Cr, 
aOi_B, Ci, 
alr_8, Crl, 
ali 8, Cil, 

a2r~8' Cr2 ' 

a2i 8, Ci2, 

a3r~8, Cr3, 

a3i_8 # Ci3, 

a4r 8, Cr4, 



f+ while ( index < bflyoff ) ( */ 



index ) 
index > 
index ) 
index ) 
index > 
index ) 
index ) 
index ) 
index ) 
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LVX( a4i_8, Ci4, index ) 

LVXC a5r~8, Cr5, index ) 

tVX< a5iT8# Ci5/ index ) 

&VX( a6r^8, Cr6, index > 

LVX( a6i 8, Ci6, index ) 

LVX( a7jT8 r Cr7, index ) 

LVX( a7i_8, Ci7, index J 

VAODFP( tlr_6, aOrJ3, a4r_8 ) 

VSUBFP( dlr_B, a0r_8, a4r_8 ) 

VADDFP ( tli_8, aOi_8, a4i_8 ) 

VSUBFP( dli_8, aOi_8, a4i_8 ) 

VADDFP ( t3r_8, alr_B, a5r_8 ) 

VSUBFP( ,<fc4r_8, a5r_8, alr_8 > 

VADDFP < t3i_8, ali_8, a5i_8 > 

VSUBFPi t4i_8, ali_8, a5i_8 ) 

VADDFP I t2r 8, a2r_8, a6r_8 ) 

VSUBFP( d2r~8, a6r~8, a2r_8 ) 

VADDFP ( t2i_8, a2i_8, a6i_8 ) 

VSUBFP{ d2i_B, a2i_8, a6i_8 ) 

VADDFP ( t5r_8, a3r_8, a7r_8 ) 

VSUBFP( t6r 8 r a7r_B, a3r_8 ) 

VADDFP ( tSO, a3i_8, a7i_8 ) 

VSOBFP( t6i_8, a3i_8, a7i_8 ) 

VADDFP { t7r_B, tlr_8, t2r - 8 ) 

VSUBFP( m2r 8, tlr_8, t2r_8 ) 

VADDFP ( t7i~8, tli_B, t2i_B ) 

VSUBFP( m2i_8, tli_8, t2i_8 ) 

VADDFP [ tBr 8, t5r_8, t3r_8 ) / 

VADDFP { t8i~8 r t3i_S, tSi_8 ) 

VSOBFPt mSr~8" t3i_8, t5i_B ) 

VSOBFPf m5i_B, t5r_B, t3r_8 ) 

VADDFP ( y0r_8, t7r_8, t8r_8 ) 

VADDFP ( yOi_8, t7i_B, t8i_8 ) 

VADDFP ( ylrj** m2r_8, m5r_8 ) 

VACJDFP( y2i~8, Ri2i~8, m5i_8 ) 

VSUBFPC y4r 8, t7r_8, t8r_8 ) 

V$OBFP( y4i~8, t7i_8, t8i_8 ) 

VSDBFPt y6r 8, m2r_8, m5r_8 ) 

VSUBFPI y60, 0120, ^i_8 ) 

VSUBFP( em4r_8, t6r_8, t4r_8 ) 

VSUBFPt w^i^B, t4i~B, t6i_8 ) 

VADDFP ( am7r^8, t4i_8, t6i_8 ) 

VADDFP ( em7i_B, t6r~8, t4r_B ) 

VMADDFP { slrJB, rad2v2, eui4r_8, dlr_8 ) 
VMADDFP { sli_8, rad2v2, em4i_8, dli_8 ) 
VNMSUBFP( s2r_8, rad2v2, era4r_8, dlr_8 ) 
VNMSOBFP { s2i 8, rad2v2, ero4i_8, dli_8. .) 
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VMADDFP { s3r_8, rad2v2, em7r_8, d2i_8 } 
VMADDFP ( s3i^/ rad2v2, em7i_B, d2r_8 ) 
VNM30BFP( s4r_8, rad2v2, em7r_8, d2i_8 ) 
VtiMSUBFP ( s4i_8, rad2v2, em7i_8, d2rj > 

VADDFP ( ylr_8, slr_8, s3r_8 ) 

VADDFP ( yli_8 # sli_8, s3i_8 } 

VSDBFP( y3r_8, s2r_8, s4r_8 ) 

VSUBFPi y3i_8, s2i_8, s4i_8 ) 

VAOOFP( y5r_B, s2i:_8, s4r_8 ) 

VADDFP ( y5i_8, $2i_8, s4i_8 ) 

VSUBFP( y 7 r_8, slr_8/ s3r_8 ) 

V5t3BFP( /y7i_B/ sli^B, s3i_B >../.- 

STVX< y0r_8, Cr, index ) /* bit-reverse output */ 

STVX( yOi_8i Ci, index > 

STVX< y2r_8, Cz2, index ) 

STVX( y2i 8/ Ci2, index ) 

STVX( y4r~8, Crl, index ) 

STVXt y41_8, Cil, index ) 

STVX( y6r~8, Cr3, index ) 

STVX( y60/ Ci3, index ) 

STVX( ylrj, Cr4, index ) 

STVXC yli_8 r CI 4, index ) 

STVX( y3r_8, Cr6, index ) 

STVXt y3i_8, Ci6, index ) 

, ,. STVX( y5r 8 f Cr5,. index ) 

STVX( y5i~8, CiS, index ) 

STVX( y7r_8, Cr7, index ) 

STVX( y7i_8, Ci7, index ) 

index +-16; 
bflycnt — 16; 

j * /* end radix-8 first pass */ 

else { /* radix-* first pass */ 

bflyoff. = N; /* 4 * N/4 - H byte offset */ 

Crl Afloat *)((char *)Cr + bf lyof f ) i 

Cil - (float *)((char *>Ci + bf lyof £ ) ; 

Cr2 = (float *)((char *>Crl + bflyoff); 

Ci2 = (float *)((char *)Cil + bflyoff); 

Cr3 = {float *)((char *)Cr2 + bflyoff); 

Ci3 - (float *)((char *>Ci2 + bflyoff); 

index - 0; 

JSS" ItVyUVVi /* while ( index < bflyoff ) i */ 

LVX< aOr, Cr, index ) 
LVX( aOi, Ci, index ) 
LVX( air, Crl, index ) 
LVX{ ali, Cil, index ) 
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LVX( a2r, Cr2, index ) 

LVX( a2i, Ci2, index ) 

LVX( a3r, Cr3, index ) 

LVX( a3i, Ci3, index ) 

VADDFP ( tlr, aOr, a2r ) 

VADDFP ( tli # aOi, a2i ) 

VSUBFPC m2r, aOr, a2r ) 

VSUBFP( ra2i, aOi, a2i ) 

VADDFP ( t2r, a3r, air ) 

VADDFP ( t2i, ali/ a3i ) 

VSU8FP( m3r, ali, a3i ) 

VSUBFPC m3i, a3r f air ) 

VADDFP( yOr, tlr.- t'2r ) 
VADDFP ( yOi, tli, t2i ) 
VADDFP { ylr, m2r / m3r ) 
VADDFP ( yli/ ift2i f m3i ) 

VSDBFP( y2r, tlr, t2r ) 

VSUBFPt y2i, tli, t2i ) 

VSUBFP< y3r, m2r, m3r ) 

VSUBFP{ y3i, m2i, m3i ) 



STVX( 
STVX< 
STVX( 
STVX( 
STVX( 
STVX( 
STVXC y3r, 
STVX( y3i. 



yGr, 
yOi, 
ylr, 

y2r, 
y2i. 



Cr, index ) 
Ci, index ) 
Cr2, index ) 



Ci2, 
Crl, 
Cil, 
Cr3, 
Ci3, 



index 
index 
index 
index 
index 



/* bit-reverse output *"/ 



index +■ 16; 
bflycnt — 16; 



while < bflyoff > 64 ) f 

index Jaump - bflyoff; 
bflycrff »= 2; 
indexjDump — bflyoff ; 



Crl - 
Cil - 

Cr2 - 
Ci2 - 
Cr3 - 
Ci3 « 



(float 
(float 
{float 
{float 
(float 
(float 



*) ( (char 
*) ((char 
*] ( (char 
*) ( (char 
*) ( (char 
*) ( (char 



/* end radix-4 first pass */ 
/* middle stages +/ 



/* decimate by 4 */ 
/* 3 * bflyoff */ 



*)Cr 
*)Ci 
*)Crl 
*)Cil 
*)Cr2 
*)Ci2 



+ bflyoff); 
+ bflyoff); 



/* adjust pointers */ 



+ bflyoff) ; 
+ bflyoff); 
4- bflyoff); 
+ bflyoff); 



index = 0; 

bflycnc - bflyoff; 
while ( bflycnt ) { 

LVX{ aOr, Cr, index ) 



/+ first (weightless) group */ 



PAGE 28/46 1 RCVD AT 5/19/2005 2:59:01 PM [Eastern Daylight Time] * SVR:USPT0-EFXRF-1/3 * DNI8:8729306* CSID:00000000000000000000 * DURATION (mm-ss):1(M)6 



FAX NO. • 0000O0OOO0000O00O000 P. 29 



LVX( aOi, Ci, index ) 

LVX( air, Crl, index ) 

LVXC all/ Cil, index ) 

LVX( a2r. Cr2, index ) 

LVX( a2i, Ci2, index ) 

LVXC a3r, Cr3, index ) 

tVXt a3i, Ci3, index ) 

VADDFP ( tlr f aOr, a2r ) 

VADDFP ( tli, aOi, a2i ) 

VSOBFP( m2r, aOr, a2r ) 

VSUBFPt m2i f aOi, a2i > 



VADDFP ( t2r, »3r, air ) 

VADDFP ( /t2i, ali, a3i ) 

VSUBFPC m3r, ali,* a3i ) 

VSUBfPt m3i f a3r, air ) 



VADDFP ( yOr, tlr, t2r ) 

VADDFP ( yGi, tli, t2i ) 

VADDFP { ylr, m2r, J&3r ) 

VADDFP ( yli/ m2i, xa3i ) 

. VSUBFPt y2r, tlr, t2r ) 
VSUBFP( y2i, tli, t2i ) 
VSUBFP< y3r, ra2r, m3r J 
VSUBFP( y3i # m2i r io3i ) 

STVX( yOr, Cr, index ) /* bit-reverse output */ 

STVX( yOi, Ci r index ) 

STVX( ylr, Cr2, index ) 

5TVX{ yli, Ci2, index ) 

STVX( y2r, Crl, index > 

STVX( y2i, Cil, index ) 

STVX< y3r, Cr3, index ) 

STVX< y3i, Ci3, index ) 



index 16; 

} bflycnt 16 ' /* end ot first (weightless) group V 



windex/ ~ 64 ; 



qcnt = N - bflyoff; n . . 

while ( gent ) ( /* 1°«P *« remaxning groups */ 



/* 

* load weights for group 
*/ 

LVEWX< cosl, wpO, windex ) 
T,VEWX( tanl, wpl, windex ) 
LV£WX( cot 2, wp2, windex ) 
LVEWX( sln2 / wp3, windex ) 

VSPI/TW( cosl, cosl, 0 ) /* replicate 4 times * 

VSPLTW( tanl, tanl, 0 ) 
VSPLTWC COt2, cot2, 0 ) 
VSPLTW( sin2, sin2, 0 ) 



ii 
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index index_bump; 

bflycnt - bflyoff; 
while ( bflycnt ) { 

LVX( aOr, Cr, index ) 

LVX[ aOi, Ci, index ) 

LVX( air, Crl, index ) 

LVX( ali, Cil, index ) 

LVX( a2r, Cr2, index ) 

J.VX( a2i, Ci2, index ) 

LVX{ a3r, Cr3, index ) 

LVX( a3i, Ci3, index } 

VMADDFP I xlr, cot2, a2r, a2i ) 
WMSUBFPC xli, cot2, a2i, a2r ) 
VMADDFP ( x2r, cot2, a3r, a3i ) 
VNMSOBFP( x2i, cot2, a3i/ a3r ) 

VMADDFP < tlr, sin2, xlr, aOr ) 
VNMSUBfPC tli, sin2, xli, aOi ) 
VMADDFP ( t2r, sin2, x2r, air ) 
VNMSOBFPC t2i/ sin2, x2i, ali ) 

VNMSUBFPC m2r, sin2, xlr, aOr ) 
VMADDFP ( m2i, sin2, xli, aOi ) 
VNMSUBFPC m3r, sin2, x2r, air ) 
. VMADDFP ( m3i, sin2, x2i, ali.) 

VMADDFP ( xlr, tanl, t2i, t2r ) 
VNMSUBFPC xli, tanl, t2r, t2i ) 
VNMSUBFPC x2r, tanl/ m3r, m3i ) 
VMADDFP { x2i, tanl, ro3i, ra3r ) 

VMADOFP( yOr, coal, xlr, tlr J 
VMADDFP { yOi, cosl, acli, tli ) 
VMADDFP ( ylr, cosl, x2r, m2r ) 
VNMSUBFPC yli, cosl, x2i, m2i ) 

VNMSUBFP ( y2r, cosl, xlr, tlr ) 
VNMSUBFPC y2i, cosl, xli, tli ) 
-'VNMSUBFPC y3r, cosl, x2r, io2r ) 
VMADDFP ( y3i, cosl, x2i, m2i ) 

$TVX( yOr, Cr, index ) /* bit-reverso output V 

STVXC yOif Ci, index ) 
STVXC ylr, Cr2, index ) 

STVXC yli, Ci2, index ) , 
$TVX( y2r, Crl, index ) 
STVX( y2i, cil, index ) 
STVXC y3r, Cr3, index } 
STVXC y3i, Ci3, index ) 

index += 16; 



bflycnt -= 16; 



/* end of butterfly loop */ 



■1 
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windex +=- 64; 
gent bflyoff; 



/* bump weight index */ 

/* end of group loop */ 
/* end of stage loop */ 



if ( bflyoff ^= 64 ) ( 



/* penultimate stage 



Crl - (float *)<(char *)Cr + 16); /* adjust pointers */ 

Cil = (float *)((char *)Ci + 16); 

Cr2 (float *)({char *)Crl + 16); 

Ci2 = (float *)((char *)Cil + 16); 

Cr3 «= (float *)((char *)Cr2 + 16); 

C13 * (float *)(<char *)Ci2 + 16); 

index - 0; ' /* 5a » e » 9 windex */ ■■ 

/* 

* first group (4 butterflies) is weightless 
*/ 

LVX( aOr, Cr, index ) 
LVX( aOi, Ci/ index ) 
LVX( air, Crl, index ) 
LVX( ali, Cil/ index ) 
LVX( a2r, Cr2, index ) 
LVX( a2i, Ci2, index ) 
LVX( a3r, Cr3, indfex ) 
LVX( a3i, Ci3, index ) 

VADDFP ( tlr, aOr, a2r ) 

VADDFP { tli, aOi, a2i ) 

VSUBFP( m2r, aOr, a2r ) 

VSUBFP( m2i r aOi, a2i ) 

VADDFP < t2r, a3r, air ) 

VADDFP ( t2i, all, a3i ) 

VSUBFP< ra3r, ali/ a3i ) 

VSUBFP( m3i, a3r, air ) 

VADDFP ( yOr, tlr, t2r ) 

VADDFP ( yOi, tli, t2i ) 

VADDFP ( »ylr, *2r, m3r ) 

VADDF^( yli, m2i, iu3i ) 

VSDBFP( y2r, tlr, t2r > 
VSOBFP( y2i, tli, t2i ) 
VSUBFP< y3r # ra2r, ru3r ) 
VSUBFP( y3i, m2i, m3i ) 

STVXC yOr, Cr, index ) /* bit-reverse output */ 

STVX( yOi, Ci, index ) 

STVX( ylr, Cr2, index ) 

STVX( yli/ Ci2, index ) 

STVXf y2r, Crl, index ) 

STVX( y2i, Cil, index ) 

STVX( y3r, Cr3, index ) 

STVX(. y3i, Ci3, index ) 
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/ 1 - ioop for remaining butterflies except the very last 
*/ 

bflycnt -= N - 32; 
while ( bflycnt ) I 

index 64; 



* load weights for group 
*/ 

I,VEWX( cosl, wpO, index ) 
XiVEWX< tanl, wpl, index > 
LVE«X( cot 2, wp2, index ) 
LVEWXC s6.t\2, wp3, index ) 
VSPLTW( cosl, cosl, 0 ) 
VSPLTW( tanl, tanl, 0 ) 
VSPLTW( COt2, C0t2, 0 ) 
VSPLTW( sin2, sin2/ 0 ) 



/* replicate 4 times */ 



LVX( aOr, Cr, index ) 
tVX( aOi, Ci/ index ) 
LVX{ air, Crl, index ) 
LVX( ali, Cil, index ) 
LVX( a2r, Cr2, index ) 
LVX( a2i, Ci2, index ) 
LVX( a3r, Cr3, index ) 
LVX( a3i, Ci3 f index ) 

VMADDFP ( xlr, cOt2, a2r, a2i ) 
VNMSUBFPC xli, cot2, a2i, a2r ) 
VMADDFP ( x2r, cot2, a3r, a3i ) 
VNMSOBFP( x2i, cot2, a3i r a3r ) 

VMADDFP ( tlr, sin2, xlr, aOr ) 
VNMSUBFP{ tli, Sin2, xli, aOi ) 
VMADDFP { t2r, sin2, x2r, air ) 
VNMSOSrP( t2i, sin2, x2i, ali ) 

VNMSUBFP< m2r, sin2, rtlr, aOr ) 
VMADDFP ( m2i, sin2, xli, aOi ) 
VNMSUBFPC m3r, sin2, x2r, air ) 
VMADDFP ( m3i, sin2, x2i, ali ) 

VMADDFP ( xlr, tanl, t2i, t2r ) 
VNMSOBFP( xli, tanl, t2r, t2i ) 
VNMSUBFPC x2r, tanl, m3r, m3i ) 
VMADDFP ( x2i, tanl, m3i, m3r } 

VMADDFP ( yOr, cosl, xlr, tlr ) 
VMADDFP C yOi, cosl, xli, tli ) 
VMADDFP { ylr, cosl, x2r r m2r > 
VNMSUBFPC yli, cosl, x2i, m2i ) 

VNMSUBFPC y2r, cosl, xlr, tlr ) 
VNMSUBFPC y2i, cosl, xli, tli ) 
VNMSUBFPC y3r, cosl, x2r, m2r ) 
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VMADDFP { y3i, cosl, x2i, m2i 3 

STVXC yOr, Cr, index ) /• bit-reverse output * 

STVX( yQi, Ci, index ) 

STVX( ylr, Cr2, index ) 

STVX( yli, Ci2, index ) 

STVX( y2r, Crl, index ) 

STVX( y2i, Cil, index ) 

STVXt y3r, Cr3, index ) 

STVX( y3i, Ci3, index ) 



bflycnt -» 16; 

> 

/* 

* very last butterfly uses 
*/ 

index +^ 64; - • 



/+ end of butterfly loop */ 
cosine/sine weights for accuracy 



LVEWX( cosl, wpO, index ) 
LVEWXC slnlf wpl, index ) 
LVEWXl cos2, wp2, index } 
LVEWXl sin2, wp3, index ) 
VSPLTW( cosl, cosl, 0 ) 
VSPI/TW( sinl, sinl, 0 ) 
VSP£TW( CO$2, cos2, 0 ) 
VSPX-WI sin2, sin2, 0 ) 



tVX( 
LVX( 
LVX( 
LVX{ 
X.VX( 
LVX( 
LVX( 
tVX( 



air, 
ali, 
a2r, 
a2i, 
a3r, 
*3i, 
aOr, 
aOi, 



Crl, 

Gil, 

Cr2, 

Ci2, 

Cr3, 

Ci3, 

Cr, 

Ci, 



index ) 
index ) 
index ) 
index ) 
index ) 
index ) 
index ) 
index ) 



VMADOFP( tlr, cos2, a2r, aOr ) 
VMADDFP ( tli, cos2, a2i, aOi ) 
VNMSUBFP( ro2r, cos2, a2r, aOr ) 
VNMSUBFPC l*2i, co$2, a2i, aOi > 

VMADDFP ( tlr, sin2, a2i, tlr ) 
VNMSOBFP( tli. sin2, a2r, tli ) 
VNMSUBFP ( m2r, sin2, s2i, m2z: ) 
VMADDFP ( m2i, sin2, a2r, m2i ) 



/* replicate 4 tiroes */ 



VMADDFP ( t2r, cos2, a3r, air ) 
VMADDFP ( t2i, cos2, a3i, all ) 
vl*MSUBFP< m3r, cos2, a3r, air ) 
VNMSU&FPC m3i, cos2, a3i, ali ) 

VMADDFP ( t2r, sin2, a3i/ t2r ) 
VNMSUBFP( t2i, sin2, a3r, t2i ) 
VNMSUBFPl m3r, sin2, a3i, m3r ) 
VMADDFP ( m3i, sin2, a3r, m3i ) 
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VMADDFP ( yOr, cosl, t2r, tir ) 
VMADDFP ( yOi/ coal, t2i, tli ) 
VNMSUBFP ( y2r, cosl, t2r, tlr } 
VNMSUBFP( y2i, cosl/ t2i, tli ) 

VMADDFP ( yOr, sinl, t2i, yOr } 
VtfMSOBFPt yOi, sinl, t2r, yOi > 
VNMSUBFP< y2r, sinl, t2i, y2r ) 
VMADDFP ( y2i, sinl, t2r, y2i ) 

VNMSUBFP{ ylr, sinl, ro3r, m2r ) 
VNMSUBFP{ yli, sinl, m3i, m2i ) 
VMADDFP ( y3r, sinl, m3r, m2r ) 
VMADDFP I y3i, sinl, m3i, m2i ) 
f 

VMADDFP ( ylr, cosl, ^31, ylr ) 
VKMSUBFP( yli, cosl/ m3r, yli ) 
VNMSUBFP( y3r, cosl, m3i, y3r ) 
VMADDFP ( y3i, cosl, m3r, y3i ) 



stvx( yOr, 
STVX( yOi, 
STVX( ylr, 
STVX( yli/ 
STVX{ y2r, 
STVX( y2i. 
STVX( y3r, 
STVX( y3i/ 



Cr, index > 
Ci, index ) 
Cr2, index ) 
Ci2, index ) 
Crl, index ) 
Cil, index ) 
Cr3, index ) 
Ci3, index ) 



/* bit- reverse output */ 



/* end penultimate pass**/ 



+ 

+ / 
Crl 
Cil 
Cr2 
Ci2 
Cr3 
Ci3 



final pass 



(float 
(float 
(float 
(float 
(float 
(float 



*) ((char 
*) ( (char 
+) ( (char 
*) ( (char 
*) ( (char 
*) ( (char 



*)Cr + N) / 
*)Ci + N); 
*)Crl + N); 
*)Cil + N); 
+)Cr2 + N); 
*)Ci2 ♦ N); 



/* adjust pointers V 



bflycnt = (ulong) *bitrp; 
w index = .X); 
index ~ 6; 

sent « (bflycnt — 1> ? 1 
bflycnt — sent; 



2; 



loop for in-place butterflies using cosine/sine weights (at most 2) 



while ( sent J { 

LVX( aOr, Cr, index ) 

LVX( aOi, Ci, index ) 

LVX( air, Crl, index ) 

LVX( ali, Cil, index > 

LVX( a2r, Cr2, index > 
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LVX{ a2i, Ci2, index ) 
LVX( a3r, Cr3, index ) 
LVX( a3i, Ci3, index ) 

LVX( coal/ wpO, windex ) 
LVX( sinl, wpl, windex ) 
LVX{ cos 2, wp2, windex ) 
LVX( sin2, wp3, windex ) 



perform two (real and imaginary) 4 x 4 permutes 
* but swapping the resulting 2 middle colurms 

*' 

VMRGHW ( pOr, aOr, air ) 

VMRGHW ( pOi/', aOi, ali ) 

VMRGHW { plr, a2r, a3r ) 

VMRGHW < pli, a2i, a3i ) 

VMRGHW ( p2r, aOr, air ) 

VMRGLW { p2i, aOi, ali ) 

VMRGLW ( p3r, a2r, a3r ) 

VMRGLW ( p3i, a2i, a3i ) 

VMRGHW t aOr, pOr, plr ) 

VMRGHW ( aOi, pOi, pli > 

VMRGLW ( air, pOr, plr ) 

VMRGLW ( ali, pOi, pli ) 

VMRGHW I a2r, p2r, p3x ) 

VMRGHW ( a2i, p2i, p3i ) 

VMRGLW ( a3r, p2r, p3r J 

VMRGLW { a3i, p2i, p3i ) 

VMADDFP ( tlr, cos2, a2r, aOr ) 
VMADDFP ( tli, cos2, a2i, aOi ) 
VNMSUBFP( jn2x, ccs2, a2r, aOr } 
VNMSUBFP( m2i, COs2, a2i, aOi ) 

VMADDFP ( tlr, 5in2, a2i, tlr ) 
VNMSUBFPt tli, sin2, a2r, tli > 
VNMSUBFP ( m2r, sin2. a2i, m2r ) 
VMADDEJP ( m2i, sin2, a2r, m2i ) 

VMADDFP ( t2r, cos2, a3r, air ) 
. VMADDFP ( t2i, cos2, a3i, ali ) 
VNMSl)BFP< m3r, cos2, a3r f air ) 
VNMSUBFPt m3i, cos2, a3i, ali > 



VMADDFP ( t2r, sin2, a3i, t2r ) 
VNMSUBFP ( C2i, sin2, a3r, t2i ) 
VNMSUBFP( iu3r, sin2, a3i, m3r ) 
VMADDFP ( m3i, sin2, a3r, m3i ) 

VMADDFP ( yOr, cosl, t2r, tlr ) 
VMADDFP ( yOi> cosl, t2i/ tli ) 
VNMSUBFP ( y2r, cosl, t2r, tlr ) 
VNMSUBFP I y2i, cosl, t2i, tli > 
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VMADDFP ( yOr, sinl, t2i, yOr ) 
VNMSUBFP( yOi/ sinl, t2r, yOi ) 
VNMSOBFP( y2r, sinl/ *2i* y2r ) 
vmADDFP ( y2i, sinl, t2r, y2i ) 

VNMSUBfPC ylr, sihl, m3r, m2r ) 
VNMSUBFP( yli/ sinl, m3i, m2i ) 
VMADDFP ( y3r, sinl, m3r, m2r ) 
VMADDFP ( y3i, sinl/ m3i, m2i ) 

VMADDFP ( ylr, cosl, m3i, ylr ) 
VNMSUBFP ( ylx, cosl, m3r, yli ) 
VNMSUBFP( y3r, cosl, m3i, y3r ) 
VMADDFP ( y$i, cosl, m3r, y3i ) 

STVX( yOr, Cr, index ) 

STVX{ yOi, Ci, index ) 

STVX( ylr, Crl, index ) 

$TVX( yli. Cil. index ) 

STVX( y2r, Cr2 r index ) 

STVX( y2i, Ci2, index ) 

5TVX< y3r, Cr3, index ) 

STVX{ y3i, Ci3, index ) 

index = « - 16; 
windex = index « 2; 
scnt -«* 1; 



index « (ulong) *++bitrp; 
windex « index « 6; 
index «= 4; 

/* 

* loop for remaining in-place 
*/ 

while ( bflycnt ) { 

tVX{ aOr, Cr, index ) 

LVX( aOi, Ci, index ) 

LVXC air, Crl, index ) 

LVX( Ali, Cil, index ) 

LVX( a2r r Cr2, index ) 

LVX( a2i, Ci2, index ) 

tVX( a3r, Cr3, index ) 

LVX( a3i, Ci3 p index ) 

LVX( cosl, wpO, windex > 
LVX< tanl, wpl, windex ) 
LVX( cot2, wp2, windex ) 
LVX{ sin2, wp3, windex ) 



/* no bit-reversal ! */ 



/* end butterfly loop */ 



butterflies (uses tan, cot weights) 



/* 



perform two (real and imaginary) 4 x 4 permutes 
but swapping the resulting 2 middle columns 
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VMRGHW < pOr. aOr, air ) 

VMRGHW ( pOi, aOi, ali ) 

VMRGHW ( plr, a2r, a3r ) 

VMRGHW ( pli, a2i, a3i ) 

VMRGLW ( p2r, aOr, air J 

VMRGLW ( p2i, aOi, ali ) 

VMRGLW C p3r f a2r, a3r } 

VMRGLW ( p3i, a2i, a3i ) 



VMRGHW ( aOr, pOr, plr ) 
VMRGHW ( eOi, pOi, pli ) 
VMRGLW ( air, pOr, plr ) 
VMRGLW ( ali, pOi, pli ) 
/ 

VMRGHW ( a2r, p2r, p3r ) 
VMRGHW ( a2i, p2i, p3i ) 
VMRGLW ( a3r, p2r, p3r ) 
VMRGLW ( a3i, p2i, p3i } 

VMKOOFP( xlr, cot2, a2r, a2i ) 
VNMSUBFP( xlii cot2, a2i, a2r ) 
VMADDFP ( x2r, cot2, a3r r a3i ) 
VNMSUBFPC x2i, cot2, a3i, a3r ) 

VMADDFP ( tlr, sin2, xlr, aOr ) 
VNMSUBFP ( tli, -sin2/ xli, aOi ) 
VWADDFP< t2r, sin2, x2r, air ) 
VNMSUBFP< t2i, sin2, x2i / ali ) 

VNMSUBFP ( m2r, sin2, xlr, aOr ) 
VMADDFP ( m2i, $in2, xli, aOi ) 
VNMSUBFP ( m 3 r> sin2, x2r, air ) 
VMADDFP { m3i/ sin2, x2i, ali ) 

VMADDFP ( xlr, tanl, t2i, t2r > 
VNMSUBFP( xli/ tanl, t2r, t2i } 
VNMSUBFP ( x2r, tanl, Tu3r, m3i ) 
VMADDFP C x2i, tanl, flt3i, Jti3r 1 

VMADDFP ( yOr, cosl, xlr, tlr ) 
VMADDF^P ( yOi/ cosl, xli, tli ) 
VMADDFP ( ylr, cosl, x2r, iu2r ) 
VNMSUBFP C yli, cosl, x2i, ro2i ) 

VNMSUBFPC y2r, cosl/ xlr, tlr ) 
VNMSUBFP ( y2i, cosl, xli, tli ) 
VNMSUBFP < y3r, cosl, x2r, m2r ) 
VMADDFP ( y3i, cosl, x2i, ra2i ) 

STVX< yOr, Cr, index ) /* no bit-reversal ! */ 

STVX( yOi/ Ci, index ) 

STVX( ylr, Crl, index ) 

STVX( yli, Cii, index ) 

STVX( y2r, Cr2, index ) 

STVX( y2i, Ci2, index ) 

STVXC y3r, Cr3, index ) 
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STVXt y3i, Ci3, index > 

index - lulongj *++bitrp; 
bflycnt 1; 
windex = index « 6; 
index «= 4; 



J 



/* end butterfly loop */ 



+ 



/♦ count of bit-reverse indices */ 



loop for out-of-place butterflies 

*/ 

bflycnt = index » 4; 
windex - 64; 
indexl 63 16; 
while I bflycnt ) I 

LVX( cosl, wpO, windex ) 
LVXt tanl, wpl, windex ) 
LVXl cot 2/ wp2, windex ) 
LVX{ sin2, wp3, windex ) 

LVX{ aOr, Cr, indexl ) 
tVXC aOi, Ci f indexl ) 
LVX( air, Crl, indexl ) 
LVX( ali, Cil/ indexl ) 
LVX{ a2r, Cr2, indexl ) 
IA/X< a2i, Ci2, indexl ) 
LVX( a3r, Cr3, indexl ) 
LVX( a3i, Ci3, indexl ) 



* perform two (real and imaginary \ 4 x 4 permutes 

* but swapping the resulting 2 middle columns 
*/ 

VMRGHWC pOr, aOr, air ) 
VMRGHWC pOi, aOi, ali ) 
VMRGHWC plr, a2r, a3r } 
VMRGRW< pli, a2i, a3i > 

VMRGLWt p2r, aOr, air ) 

VMRGLW(,.p2i, aOi, ali ) 

VMRGI*ty( p3r/ a2r, *3r ) 

VMRGLtfC p3i, a2i r a3i ) 

VMRGHWC aOr, pOr, plr ) 

VMRGHW ( aOi/ pOi, pli > 

VMRGLWC air, pOr, plr ) 

VMRGLW( ali/ pOi, pli ) 

VMRGHW ( a2r, p2r, p3r ) 

VMRGHWC a2i, p2i, p3x ) 

VMRGLW( a3r, p2r, p3r ) 

VMRGLW { a3i, p2i, p3i > 

VMADDFP < xlr, cot2, a2r, a2i ) 
VNMSU8FP ( xli f cot2, a2i, a2r ) 

VMADDFP ( x2r, cot2, a3r, a3i ) ^ 
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VNMSOBFP( x2i, cot2, a3i, a3r ) 

VMADDFP ( tlr, sin2, xlr, aOr ) 
VNMSUBFP( tli, sin2, xli, aOi ) 
VMADDFP { t2r, sin2, x2r, air > 
VNMSUBFP( t2i, sin2, x2i, all ) 

VNMSUBFPC m2r, sin2, xlr, aOr ) 
VMADDFP ( m2i, 3in2, xli, aOi ) 
VNMSOBFPC m3r, sin2, x2r, air ) 
VMADDFP ( m3i/ sin2 # x2i, ali ) 

VMADDFP { xlr, tanl, t2i, t2r ) 
VNMSUBFP< xli, tanl f t2r, t2i ) 
VKMSUBFPC x2r, tanl, m3ri m3i ) 
VMADDFP ( x2i, tanl, m3i, m3r ) 

VMADDFP ( yOr, cosl, xlr, tlr ) 
VMADDFP ( yOii cosl, xli/ tli ) 
VMADDFP { ylr, cost, x2r, m2r ) 
VNMSUBFP( ylif cosl, x2i, ro2i ) 

VNMSUBFPC y2r, coal, xlr, tlr ) 
VNMSUBFPC y2i, cosl, xli, tli ) 
VUMSUBfP{ y3r, cosl/ x2r, m2r ) 
VMADDFP { y3i, cosl, x2i, m2i ) 

index2 = (ulong) +++bitrp; 
windex » index2 « 6; 
index2 «= 4; 

LVX( coal, wpO, windex ) 

LVX( tanl. wpl, windex ) 

tVXC cot2, wp2, windex ) 

LVX{ sin2, wp3, windex ) 

LVX( aOr, Cr, indcx2 > 
LVX( aOi, Ci, index2 ) 
LVX< air, Crl, index2 ) 
LVX( ali, Cil, index2 ) 
LVX( a2r, Cr2, index2 > 
LVX( a2i, Ci2, index2 } 
LVX( a^r, Cr3, index2 ) 
LVX( a3i, Ci3, index2 ) 

STVX{ yOr, Cr, ind*x2 ) /♦ no bit-reversal ! */ 

STVX( yOi, Ci # index2 ) 
STVXt ylr, Crl, index2 1 

STVX( yli, Cil, index2 ) " . 

STVXC y2r, Cr2, index2 ) 
STVX( y2i, Ci2, index2 ) 
STVX( y3r, Cr3, index2 ) 
STVX( y3i, Ci3, index2 ) 

/* 

* perform two (real and imaginary) 4x4 permutes 

* but swapping the resulting 2 middle columns 
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VMRGHW ( pOr, .aOr, air ) 

VMRGHW ( pOi, aOi, ali ) 

VMRGHW ( plr, a2r, a3r ) 

VMRGHW ( pli, a2i, a31 ) 

vmRGLW { p2r, aOr, air ) 

VMRGLW ( p2i, aOi, ali } 

VMRGLW ( p3r, a2r, a3r ) 

VMRGLWt p3i, a2i, a3i ) 

VMRGHW { aOr, pOr, plr ) 

VMRGHW { aOi, pOir pli ) 

VMRGLW ( air, pOr, plr ) 

VMRGLW ( alii pOi, pli ) 

VMRGHW ( a2r, p2r/ p3r ) 

VMRGHW I a2i, p2i f p3i ) 

VMRGLW ( a3r, p2r, p3r ) 

VMRGLW ( a3i, p2i, p3i ) 

VMADDFP < xlr, cot2, a2r, a2i ) 
VNMSOBrPC xli, cot2, a2i, a2r ) 
VMADDFP ( x2r, cot2, a3r, a3i ) 
VtfMSUBFP( x2i, cot2, a3i, a3r ) 

VMADDFP ( tlr, sin2, xlr, aOr J 
VNMSUBFP( tli, sin2, xli, aOi ) 
VMADDFP ( t2r, sin2, x2r, air ) 
VNMSUBFP< t2i, sin2, x2i, ali ) 

VNMSUBFP( m2r, sin2, xlr, aOr ) 
VMADDFP ( m2i, sin2, xli, aOt ) 
VNMSUBFP( m3r, sin2, x2r, air ) 
VMADDFP ( m3i, sin2, x2i r ali } 

VMADDFP 1 xlr, tanl, t2i, t2r J 
VNMSUBFP( xli, tanl, t2r, t2i ) 
VNMSOBFP( x2r, tanl, m3r; m3i ) 
VMADDFP ( x2i, tanl, m3i, m3r ) 

VMADDF>P[ yOr, cosl, xlr, tlr ) 
VMADDFP t yOi, cosl, xli, tli ) 
VMADDFP ( ylr, cosl, x2r, m2r ) 
VKMSUBFP< yli, cosl, x2i, w2i ) 

VNMSUBFP( y2r, qosI, xlr, tlr ) 
VNMSUBFP( y2i, cosl, xli, tli ) 
VNMS08FP< y3r, cosl, x2r, m2r ) 
VMADDFP ( y3i, coslr x2i, m2i ) 

STVX( yOr, Cr, indexl ) /* no bit-reversal i V 

STVX( yOi, CL, index! ) 

STVX( ylr, Crl, indexl ) 

STVX( yli/ Cil, indexl ) 

STVX( y2r, Cr2, indexl ) 

STVX*( y2i, Ci2 f indexl ) 




PAGE 40/46 * RCVD AT 5/19/2005 2:59:01 PM [Eastern Daylight Time] * SVR:USPT0-EFXRF-1/3 * DNIS:8729306 * CS!D:0O0000O0000O00OQOOQO * DURATION (mm-ss):10-06 



FAX NO. 00000000000000000000 P. 41 



STVX( y3r, Cr3, indexl ) 
STVX( y3i, Ci3, index! ) 

indexl - (along) *++bitrp; 
windex =» indexl « 6; 
indexl «= 4; 

bflycnt 2; 



/* end butterfly loop */ 



Paoe 29 " 
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I* File Name: ppc vmx.c 

I* Description: Contains C functions that emulate PPC vmx 

I * (altivec) instructions ^ 

I * * 

I* Mercury Computer Systems, Inc. 

I* Copyright <c> 1999 All right* reserved * 

I * * 

I* Revision Date Engineer; Reason ^ 

I * L # 

I * 0.0 991119 jg; created 



#include n ppc_vmx-h" 

long CB[ 8 ] ; condition register V 

void lvewx( VMX reg *vT, ulong rA, ulong rB ) 

{ 

ulong *addr; 
ulong i ; 

addr = (ulong *)((rA) + (rB) ) ; 
i = ((ulong) addr & Oxc) » 2; 
(vT)->ul[i) * *addr; 

) 

void _lvx( VMX_reg *vT, ulong rA, ulong rB ) 
{ 

ulong *addr; 
ulong i; 

addr - (ulong *)(((rA) + (rB) ) & -15); 
for ( i - 0; i < 4; i++ ) 
(vT)->ul(i] » addrCi]; 

) 

void stvewx( VMX_reg *vS, ulong rA, ulong rB ) 
< 

ulong *addr; 
ulong i; 

addr = (ulong *M(rA) + (rB) ) ; 
i - ( (ulo/;g)addr * Oxc) » 2; 
*addx - XvS)->ul[i] ; 

> 

void _stvx( VMXjreg *vS, ulong rA, ulong rB ) 
I 

ulong *addr; 

ulong i; . 
addr - (ulong *)(((rA) + (rB) ) & -15); 
for ( i - 0; i < 4; i++ > 
addr[i) - (v$)->ul[i]; 

) 

void _vaddfp( VMX reg *vT, VMX_reg *vA, VMX_reg *vB ) 
( 

ulong £; 
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for ( l « 0; i < 4; i++ ) 

(vT)->f[il = (vA)->£[iJ + CvB)->fti]> 

) 

void vmaddfp< VMX.reg *vT, VMX_reg *vA, VMXreg *vC, VMX_reg *v0 ) 
{ 

ulong i; 

""w-iuiVcciM^flil * cvc)->f[il) * lvB)->flil; 

> 

void vraxghwC VMX^reg *vT, VMX_reg *vA, VMXreg *vS ) 
( 

VMX_reg v; 
ulong i, j; / 
for ( i = 0; i < 2; ) i 

j » i + i? 

v.ul[j] = CvA)->ulliJ; 
v.ultO+D] = (vB)->ul[il; 

) 

for C i - Of i < «' 1++ ' 
(vT)->ul(il - v.ul(i]; 

) 

void _vmrglw< VMX_reg *vT, VHXjrag *vA, VMX_reg *vB ) 
{ 

VMX_reg v,- 
ulong if 3; 

for { i - 0; i < 2? i++ ) ( 
j = i + i; 

v.ul[j] = [vA>->ul[(2+i)l; 
v.ult(j+Dl - (»B)->ul[(2+i) J; 

) 

for { i - 0; i < 4; i++ ) 
(vT)->ul[i) - v.ullU; 

) 

void vm a ubfp< VMX^reg *vT, VMXjreg *vA, VWX^reg *vC, VMX_reg *vB ) 
( 

ulong i; 

^Cvt^/iii^UwSftiJ * (vc>->fti]) - (vB)->friJ; 

> J 

void _vma B ubfp( VMXr-g *vT, VMX.r^g *vA, VMX_reg -vC, VMX.reg *vB > 
{ 

ulong Li 

^cixt-rtfil^-Vciiw-ifCil * tyC)->f[il) - (vB)->flil); 

) 

void _vslw( VMX_reg *vT, VMXjreg *vA, VMXjreg *vB ) 
( 

ulong i, sh; 

for ( i =» 0; i < 4; i++ ) I 

sh - (vB)->ul[i] s (ulong) Oxlf; 
(vT)->ulti) - (vA)->ul[i) « sh; 
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void _v 9 pltisw< VMX_reg *vT. long SIMM ) 
{ 

ulong i; 

for ( i = O; i < 4; i++ ) 

(vT>->ltil " (long) (SIMM) ; 

) 

void _vspltw( VMX_reg *vT, VMX_reg *vB. ulong UIMM ) 
I 

along i, ul; 

ui = (vB)->ul[(OIMM) * 0x3 l'" 
for ( i ~ 0; i/< <'* i++ ) 
(vT)->ul[iJ - ul; 

) 

void _vsubfp( VMX_reg *vT, VMX_reg *vA, VMX_reg *vB ) 
( 

ulong i ; 

for f i « 0; i < 4; i++ ) 

(vT)->f[i) - (»A)->fIil " (vB)->fUJ; 

) 

void _vxor( VWX_reg *vT, VMX_reg *vA, VMX_reg *vB ) 
I 

ulong i ; 

for ( i = 0; i < 4 ' i++ 5 ,,„ 
(vT)->ullil - (vA)->ul£i) * (vB)->ultil; 



/ 
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• File Name: ppcvmx.h 
Description : 



. Header fU . for P*c vnuc (altivec) eolation 

Copyright cej 1999 All rights reserved 
* Revision Date 



0.0 991119 



Engineer; Reason 
jg; Created 



#define uchar 
tdefine US hort 
#define ulong 



unsigned char 
Unsigned short 
unsigned -long 



/* 



define a structure to represent a VMX ( sr WD ) register 



typedef union { 



char 

uchar 

short 

ushort 

long 

ulong 

float 

VMX^reg/ 



CI16J; 
ue[l6J; 

US [8]; 

ul{4]; 



condition register comprised of 8 <- bit fuida (Q _ ?) 



extern long CR(J; 




vmx instuction macros 
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#define 
#define 
#dafine 
#define 
#def ine 
tdefine 
#define 
#define 
♦define 
fdefine 
#define 
#define 
tdefine 
#def ine 
#de£ine 



LVEWX{ vT, rA, r8 ) 
LVX( vT, rA, rB ) 
STVEWX( vS, rA, rB ) 
STVX( vS, tfi., rB , 
VADDFP ( vT, VA, vB ) 
VMADDFPf. vT, vA, vC, vB ) 
VMRGHW ( vT, vA, vB ) 
VMRGLH ( VT, vA, vB j 
VMSUBFP( vT, vA, vC, vB ) 
VNMSOBFP( vT, vA, vC, vB ) 
VSLW< vT, vA, vB ) 
VSPLTW< VT, VB, DIMM ) 
VSPLTISW( vT, SIMM ) 
VSUBFP< vT, vA, v B ) 
VXOR( vTyyA, vB ) 



-lltTii^ 1 , (ulon 9>rA, (ulongjrB); 

, ' <uion 9> r A/ (ulong)rB ) ; 
_vaddfp( fivT, fivA, fivB ); 
^vmaddfpt fivT, svA, fivC, fivB J; 
_vnu:ghw< fivT, SvA, fivB ) ; 
_vrorgXw( fivT, svA, fivB J- 
_vmsubfp( 4 vT, SVA, five, ' fivB )/ 

_v$lw( fivt, svA, fivB ); 
_vapltw( fivT, &vB, UIMM > / 
_vspltiaw{ fivT, SIMM ); 
_vsubfp( fivT, SVA, fivB 1/ 
_vxoy( fivT, &vA, fivB ); 
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